An Evaluation of Lexicalization in Parsing

نویسندگان

  • Aravind K. Joshi
  • Yves Schabes
چکیده

In this paper , we evaluate a two-pass parsing strategy proposed for the so-called ' lexicalized' grammar. In ' lexicalized' grammars (Schabes, Abeill$ and Joshi, 1988), each elementary structure is systematical ly associated with a lexical i tem called anchor. These structures specify extended domains of locality (as compared to CFGs) over which constraints can be stated. The 'g rammar ' consists of a lexicon where each lexical i tem is associated with a finite number of structures for which that i tem is the anchor. There are no separate grammar rules. There are, of course, ~rules' which tell us how these structures are combined. A general two-pass parsing strategy for ' lexicalized' grammars follows naturally. In the first stage, the parser selects a set of elementary structures associated with the lexical i tems in the input sentence, and in the second stage the sentence is parsed with respect to this set. We evaluate this s trategy with respect to two characteristics. First , the amount of filtering on the entire grammar is evaluated: once the first pass is performed, the parser uses only a subset of the grammar. Second, we evaluate the use of non-local information: the structures selected during the first pass encode the morphological value (and therefore the position in the string) of their anchor; this enables the parser to use non-local information to guide its search. We take Lexicalized Tree Adjoining Grammars as an instance of lexicallzed grammar. We i l lustrate the organization of the grammar. Then we show how a general Earley-type TAG parser (Schabes and Joshi, 1988) can take advantage of lexicalization. Empirical da ta show that the filtering of the grammar and the non-local information provided by the two-pass strategy improve the performance of the parser. 1 L E X I C A L I Z E D G R A M M A R S Most cu r ren t l inguis t ic theor ies give lexical accounts of severa l p h e n o m e n a t h a t used to be cons idered pure ly syn tac t i c . T h e in fo rma t ion p u t in the lexicon is t he reby increased in b o t h a m o u n t and complex i ty : see, for example , lexical rules in L F G ( K a p l a n and Bresnan , 1983), G P S G ( G a z d a r , Kle in , P u l l u m and Sag, 1985), H P S G (Po l l a rd and Sag, 1987), C o m b i n a t o r y Ca tego r i a l G r a m m a r s ( S t e e d m a n 1985, 1988), K a r t t u n e n ' s vers ion of Ca t ego r i a l G r a m m a r ( K a r t t u n e n 1986, 1988), some vers ions of GB theo ry ( C h o m s k y 1981), and L e x i c o n G r a m m a r s (Gross 1984). We say t h a t a g r a m m a r is ' l ex ica l ized ' if i t consis ts of.. 1 • a f ini te set of s t r u c t u r e s each as soc ia t ed wi th a lexical i tem; each lexical i t e m will be cal led the anchor of the co r r e spond ing s t ruc tu re ; t he s t ruc tu re s define the d o m a i n of loca l i ty over which cons t r a in t s are specified; cons t ra in t s are local wi th r e spec t to the i r anchor; • an o p e r a t i o n or ope ra t i ons for compos ing the s t ruc tu res . Not ice t h a t Ca t ego r i a l G r a m m a r s (as used for example by Ades and S t e e d m a n , 1982 and S t e e d m a n , 1985 and 1988) are ' l ex ica l ized ' accord ing to our def ini t ion since each bas ic ca t egory has a lexical i t e m assoc ia ted wi th it. A genera l two-s tep pa r s ing s t r a t e g y for ' l ex ica l i zed ' g r a m m a r s follows na tu ra l ly . In t he first s tage , the pa r se r selects a set of e l e m e n t a r y s t ruc tu re s assoc ia ted wi th the lexical i t ems in the i npu t sentence , and in the second s tage the sentence is pa r sed wi th r e spec t to th is set . T h e s t r a t e g y is i n d e p e n d e n t of the na tu r e of the e l e m e n t a r y s t ruc tu re s in the unde r ly ing g r a m m a r . In pr incip le , any pa r s ing a l g o r i t h m can be used in the second s tage. 1 By qexicalization' we mean that in each structure there is a lexical item that is realized. We do not mean simply adding feature structures (such as head) and unification equations to the rules of the formalism.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Minimally Lexicalized Dependency Parsing

Dependency structures do not have the information of phrase categories in phrase structure grammar. Thus, dependency parsing relies heavily on the lexical information of words. This paper discusses our investigation into the effectiveness of lexicalization in dependency parsing. Specifically, by restricting the degree of lexicalization in the training phase of a parser, we examine the change in...

متن کامل

Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French

This paper presents the first probabilistic parsing results for French, using the recently released French Treebank. We start with an unlexicalized PCFG as a baseline model, which is enriched to the level of Collins’ Model 2 by adding lexicalization and subcategorization. The lexicalized sister-head model and a bigram model are also tested, to deal with the flatness of the French Treebank. The ...

متن کامل

بررسی مقایسه‌ای تأثیر برچسب‌زنی مقولات دستوری بر تجزیه در پردازش خودکار زبان فارسی

In this paper, the role of Part-of-Speech (POS) tagging for parsing in automatic processing of the Persian language is studied. To this end, the impact of the quality of POS tagging as well as the impact of the quantity of information available in the POS tags on parsing are studied. To reach the goals, three parsing scenarios are proposed and compared. In the first scenario, the parser assigns...

متن کامل

Capturing Disjunction In Lexicalization With Extensible Dependency Grammar

In spite of its potential for bidirectionality, Extensible Dependency Grammar (XDG) has so far been used almost exclusively for parsing. This paper represents one of the first steps towards an XDG-based integrated generation architecture by tackling what is arguably the most basic among generation tasks: lexicalization. Herein we present a constraint-based account of disjunction in lexicalizati...

متن کامل

L1 Glossing and Lexical Inferencing: Evaluation of the Overarching Issue of L1 Lexicalization

This empirical study reports on a cross-linguistic analysis of the overarching issue of L1 lexicalization regarding two (non)-interventionist approaches to vocabulary teaching. Participants were seventy four juniors at the Islamic Azad University, Roudehen Branch in Tehran. The investigation pursued (i) the impact of the provided (non)-interventionist treatments on both sets of (non)-lexicalize...

متن کامل

Statistical parsing for German: modeling syntactic properties and annotation differences

Statistical parsing research can be described as being anglo-centric: new models are first proposed for English parsing, and only then tested in other languages. Indeed, a standard approach to parsing with new treebanks is to adapt fully developed English parsing models to the other language. In this dissertation, however, we claim that many assumptions of English parsing do not generalize to o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1989